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HUMAN-NATURAL STRING COMPARE FOR 
FILESYSTEMS 

The invention pertains to the art of computer filesystem 
directory search, maintenance, and rile location. In S 
particular, the invention pertains to methods for comparing 
filename strings while locating files in computer filesystem 
directories. 

BACKGROUND OF THE INVENTION . 10 

Computer filesystems are typically accessed through file- 
system directories. Filesystem directories are typically 
databases, often organized hierarchically, containing a file 
entry for each file on the system. File entries typically 
include a filename, a file pointer that directly or indirectly 15 
indicates where the file is located on the filesystem, as well 
as information about the file, often including file status flags, 
creation, and access history information. Filesystem direc- 
tories are often frequently accessed. 

Files on computer filesystems are usually referred to by 20 
filenames. Each time a file on a filesystem is "opened", or 
accessed for the first time in a given program, it is necessary 
to search the filesystem directory for file entries having a 
filename matching the file name of the file to be opened. 

Each file entry typically has a file status field and a file 25 
pointer field in addition to the filename field. 

If a file entry having a matching filename is found, the file 
status may then be tested for read, execute, and write 
permissions, as well as any file-lock information. The file 30 
pointer may then be followed to locate any existing file 
contents; which may then be read, overwritten, or deleted. If 
the file entry has file status indicating that it is a subdirectory, 
the file pointer may also be followed to that subdirectory, 
where a further search may be performed for file entries 35 
having *a filename matching remaining characters of the 
filename of the file to be opened. 

Filesystem directories may be organized in many ways. A 
common directory organization, used with many Microsoft 
filesystems among others, has multiple unsorted file entries 40 
in a list of file entries for each directory. Each file entry has 
status indicating whether the entry represents a valid file. 
Locating a file is then done by comparing the filename being 
searched for to the filename of successive file entries for 
valid files, ignoring any entries marked invalid, until all file 45 
entries have been examined or a match is found. Directories 
having this structure often require numerous comparisons 
for each file "open" operation. It is therefore advantageous 
to quickly perform each comparison operation when search- 
ing filesystem directories. 50 

Another common directory organization, used with many 
UNIX and similar operating systems, has multiple file 
entries in a list of file entries for each directory. Each file 
entry has a filename string, a length of that filename string, 
and a pointer, in the form of an inode number, to an inode 55 
associated with the file. The inode associated with the file 
has file status information and file location information, the 
file location information may be direct or indirect through 
further inodes. 

Many filesystem directory search engines have a string 60 
comparison routine that compares a filename string with a 
filename string stored in each file entry. Many of these string 
comparison routines operate by successively comparing 
characters, bytes, or words of the strings in order from the 
first character, byte, or word, of the strings to the last 65 
character, byte, or word, of the strings. These engines 
typically stop comparing the text strings when a mismatch is 
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found. With string comparison routines of this type, a 
mismatch will be detected in a time that increases with the 
number of characters, bytes, or words, that must be com- 
pared before the mismatch is detected. 

It is also known that at least some filesystem directory 
databases store filenames in file entries in a field of fixed 
width; it is known that some filesystems link multiple 
fixed-length fields together to store long filenames. It is 
known in the art of computer string handling that a string- 
length byte may be stored ahead of the first character of a 
string, and that such a string length character is convenient 
for use in performing string manipulations. 

SUMMARY OF THE INVENTION 

It has been observed that many files have filenames that 
are identical or similar in an initial portion of filename, 
differing in later portions of the filename. A new directory 
search engine therefore compares a filename with filename 
fields in each file entry in order from the last character of 
each string to the first character of each string. The directory 
search engine stops comparing the strings when a mismatch 
is found. 

On average, the directory search engine of the present 
invention identifies filename mismatches more quickly than 
prior art search engines because many filenames are iden- 
tical or similar in an initial portion of filename. Since many 
comparisons are performed each time a file is located in a 
typical filesystem, a considerable savings in processor time 
may be attained. 

BRIEF DESCRIPTION OF THE DRAWINGS 

The aforementioned and other features and objects of the 
present invention and the manner of attaining them will 
become more apparent and the invention itself will be best 
understood by reference to the following description of a 
preferred embodiment taken in conjunction with the accom- 
panying drawings, wherein: 

FIG. 1A is an illustration of the structure of a common 
filesystem directory database, showing the filename fields of 
file entries; 

FIG. IB, an block diagram of a machine having a file- 
system directory database, showing how it may be con- 
nected to a network; 

FIG. 1C is an illustration similar to FIG. 1 of another 
structure of a filesystem commonly used with UNIX and 
other operating systems. ^ 

FIG. 2 a flowchart of a prior-art directory -search string- 
comparison function; and 

FIG. 3 a flowchart of a directory-search string- 
comparison function of the present invention. 

FIG. 4 is an illustration of a structure of another common 
filesystem directory database for which the reverse-order 
string comparison of the present invention is particularly 
useful. 

DESCRIPTION OF A PREFERRED 
EMBODIMENT 

In a typical disk-memory filesystem directory database, 
there is a "root" or "top-level" directory 100, normally 
stored on the filesystem 101 and beginning at a predeter- 
mined location. This top-level directory 100 contains mul- 
tiple file entries, 103, 104, and 105. Each file entry 103 has 
a file pointer 108 that indicates, directly or indirectly, the 
location of the file 109 corresponding to that file entry 103 
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on the filesystem disk 101. It is known that file pointer 108 the United States and other countries). In this directory 

may indicate the file location indirectly through a File structure, a directory file 179 has zero or more file entries 

Allocation Table, as in some Microsoft EAT-16 and FAT-32 such as file entry 180. Each file entry has an inode number 

filesystems. Each file entry 103 also has a filename 110 field 181, a file name string 182, and a length 183 of the file name 

containing a name for the file, and associated file status, S string. Each entry may also have a length (not shown) of the 

length, and date fields 111. file entry. The inode number 181 of each file entry points to 

When a file 130 must be located by name, a current «n inode record 184, which contains file status information 

default "path" may be joined to the name. Alternatively, files 185 and one or more location pointers 188. The location 

may be located by path and name. In both cases, the path and pointers 188 may point directly 190, or indirectly 191 

name is parsed into a name list 115 of one or more filename 10 through further inodes 192 having further pointers 194, to 

fields, representing the filenames of all the subdirectory files, the data of a file 195 on the disk 196. As with other 

such as subdirectory name 116, that must be located in order filesystems, a disk file may be a subdirectory 198, containing 

to find the file, and the filename 117 of the file to be found. further file entries 199. 

The filename 117 of the file is the last entry of this name list. Clearly, multiple string-comparison operations are often 

Next, the "root" or "top-lever directory is read by a 15 required when finding a specific file in a directory database 

computer (not shown), and then searched for any occurrence of this or similar structure. Therefore it is advantageous if 

of the first name of the name list 115. On many filesystems, each string-comparison executes quickly, 
this search is performed by comparing the filename string of Typical directory search engines perform each string- 

the first name of the name list 115 to filename strings stored comparison operation by first initializing 200 (FIG. 2) a pair 

in the filename fields (110 and 120) of each successive file 20 of pointers, or array indexes, one of these pointers is 

entry (103, 105, and 104) until a filename field 120 is found initialized to point to the beginning of the appropriate name 

that matches the first name of the name list 115. Many string of the name list 115 (FIG. 1), the other to the beginning of 

comparison operations may need to be performed before a a filename entry in the directory database. String characters 

match is found, because there may be several hundred file from the name list and the directory database are then 

entries to search through in a single top level directory or 25 fetched and compared 201; if a mismatch is found the string 

subdirectory of the directory database. compare reports a mismatch 202, whereupon the search 

Should the file status field 121 of the file entry 104 having engine reinitializes to check the next file entry, if any, at the 

the matching filename field 120 be marked indicating that same level of the directory database. If a match is found the 

the associated file is a subdirectory, and the matching pointers are advanced 203, and a test is performed 205 to 

filename of the name list 115 not be the last name of the determine if all string characters have been tested, e.g., a 

name list, the file pointer 122 of the file entry 104 is followed done test or check. This test may be performed in any of 

to the associated subdirectory file 125. This subdirectory file several ways; a counter may be decremented as illustrated if 

125 also contains a sequence of file entries, as with file entry the length of the strings is a constant or is known— as in the 

126. The subdirectory file 125 is recursively searched for a case of systems that store strings with a string length byte 

file entry 126 having a filename field 127 that matches the located in memory immediately before the first character of 

next filename string 117 of the name list 115 until all entries the string. Alternatively, a NULL character may be placed at 

of the name list have been considered. The file pointer 128 the end of each string, completion of the string-compare 

of the last found file entry 126 may then be followed to the operation is then found when the next character indicated by 

file 130. There may be many subdirectory files in the top 4Q th e pointers is a NULL. 

level or root directory, and there may be subdirectory files in It has been observed that it is common for filenames to 
subdirectory file 125. occur in groups, where the first characters of all filenames in 
Such a filesystem is typically implemented on a computer, the group are the same, but one or more later characters, 
such as computer 150 (FIG. IB), which may but need not be oflen characters near the end of each filename string, are 
connected through a network interface controller 151 of the 45 different. Humans often do this when naming files because 
computer to the Internet 152 or other network. A filesystem each group of files may be associated with a different 
directory search engine is loaded into memory 153 of the program, utility, project, or other classification in their 
computer 150, while the filesystem directory database- and minds. When comparing filename strings during a search for 
its files are initially located on one or more disk drives 155. a filename in a group of this type with a string-comparison 
The computer also typically has a motherboard 156 with at 50 operation like that of FIG. 2, it is necessary for the string- 
least one CPU wherein the filesystem directory search comparison operation to compare many characters of each 
engine executes. Part of the filesystem directory database, filename before a match or mismatch condition can be 
and part of the files of the filesystem, may be cached in the found. The more characters compared before each match or 
memory 153 of the computer 150. The directory search mismatch is determined, the more computer processor time 
engine is invoked whenever a program, which may be an 55 required by the multiple string comparison operations 
operating system or user interface program, and which may required to locate a file. 

be responding to a request transmitted to the computer 150 The directory database search engine of the present inven- 

from another computer 160 of the network 152, running on tion has a string comparison operation that compares the 

the computer 150 requests access to a file. If the request filename strings in reverse order. It has been found that if 

originated on the other computer 160 of the network, that 60 files are named in groups as described in the preceding 

request may or may not have been transmitted through paragraph, less computer processor time is required to locate 

multiple sets of network hardware 161, bridges, switches, a file with this string comparison operation than with the 

and firewalls 162, as are known in the art of networks. traditional string comparison operation of FIG. 2. 

Another common directory structure, also generally The string comparison operation of the directory database 

implemented on a computer, is commonly used with UNIX 65 search engine of the present invention assumes that the 

and similar operating systems, including the Solaris (a filename string lengths are stored with the strings; alterna- 

trademark or registered trademark of SUN Microsystems in tively the strings may be all of equal length. It begins by 
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comparing 300 (FIG. 3) the lengths of the strings. If the name. Each further file entry 402 also contains a further 

lengths are different, a mismatch has already been found. If fixed-length filename field 405 containing a further section 

the lengths are not different, two pointers are initialized to of the filename. At least one such file entry also contains a 

point to the last character, or word, of each of the strings, the pointer to the file, as with the previously described embodi- 

end of the filename list 115 entry 116 being searched for, and 5 ment. 

the end of the filename field 110 of the file entry 103 being ^ M& altemative embodimcnt> the directory databasc 

tested for a match. . . 4 . AfUX . AAe c 

search engine compares each section 400 and 405 of the 

Once the pointers are initialized, the characters pointed to filename in reverse order as heretofore described and as 

by the pointers are compared 303 (FIG. 3). If a mismatch is illustrated in FIG. 3. The sections 400 and 405 of the 

detected 304, the string comparison operation ceases and 10 filename are> however, compared sequentially from first 

reports that a mismatch has been found— the directory section 400 to last until a mismatch is found or all sections 

database search engine may then initiate a filename string 0 f the filename are found to match. 

compare operation of another file entry in the directory 

being searched. If the characters matched, a test 306 is made WMe there have been described above the principles of 

to determine if all characters of the strings have been 15 the P resent mvention in conjunction with specific imple- 

compared. The preferred embodiment performs this test 306 mentation thereof, it is to be clearly understood that the 

by checking if one of the pointers has been decremented to Agoing desenption is made only by way of example and 

point 307 to the beginning of the associated string. not as a Hmitation to the scope of the invention. Particularly, 

Alternatively, a loop counter may be used as with the it is recogmzed that the teachings of the foregoing disclosure 

prior-art compare operation of FIG. 2. 20 will suggest other modifications to those persons skilled in 

ii i a i? il i • i i_ . tne relevant art. Such modifications may involve other 

When all characters of the strings have been compared fcatures whjcb afe a , read se ^ ^ ^ 

toe compare operation ceases and reports 308 that a match used mstead of or iQ tQ features a , read 

has been found. If not a 1 characters have been compared the herein ^ b daims haV6 been formula , ed m this u . 

pointers are decremented 310 to pom to the next earlier caUon , 0 ticukr combinations of features> it should be 

character of both s nngs, and the characters pointed to by the understood mat the of tbe disclosure berein also 

decremented pointers compared 303. The process repeats includes n(ml fealure Qr novel mjMluaoa of 

untal e.toer a mismatch is found or all characters of the feaftjres ^ either m j or im licitl or 

strings have been compared. generalization or modification thereof which would be 

The file compare operation of the present invention is M apparent to persons skilled in the relevant art, whether or not 

ideal for use with the directory database structure of FIG. sucb re i atcs to ^ ^ invention as presently claimed in 

1C, as used on many UNIX and UNIX-like operating any dajm whether or not it mitigates any or all of the 

systems, including toe UFS filesystem used with the Solaris samc technical problems as confronted by the present inven- 

opcrating system. When the file compare operation of the uon . The applicants hereby reserve toe right to formulate 

present invention is used with the UFS filesystem on a 3J n6W c i a i ms to such features and/or combinations of such 

thirty-two bit machine, characters are preferably compared features during the prosecution of the present application or 

in words of four characters per word, instead of character by 0 f further application derived therefrom. 

character as heretofore discussed, since filenames are guar- , . , 

anteed to be NULL terminated and padded to the end of a In Prewar, * 15 anticipated that reverse-order string 

word with NULL characters. Comparison may occur in an comparison of the present mvention may be usefu in 

segments of other sizes, such as two or eight characters, on ° parching many forms of directory structures, including 

other machines, as appropriate for the computer on which <^ ctor y structures that have directory database structures 

the directory database search engine is run and the structure ^ Kn \ & om , the *? aflc J databa f e stmcture described 

of the directory database. ~ herem - U K also anticipated that the reverse-order string 

. . .. , . , , , , , comparison of the present invention may be implemented 

While it is possible to compute the length of toe filename 45 ^ word of , (d re ations in lace of lhe 

string lengths on the fly ,it 1S preferred that lengths be stored character * ations hercin describ6(J; such m 

in toe directory files because having to determine these implementation ^ particu i arly useful it t he strings to be 

lengths while searching the directories can cost substantial compared are stored in a packed format having 6 multiple 

processor une. characters per word or longword. 

In an alternative embodiment of the present invention, 50 . , . „ . . 
intended for use with a particular common filesystem, each ^ <Walent embodiment of the present mvention dec- 
filename of the directory database is divided among one or mde 1 Vf a StnD _ g array conta,mn g th c e fi ^ nanl ? list 
more file entries, where each file entry contains a fixed- J 15 fl ] ename U * searched for m place of the pointer 
length filename field 400. This common filesystem was heretofore described, the array being located at a particular 
designed for a degree of compatibility with old software that ss base address - ln . th J is embodiment, there is an index into the 
used fixed-length filenames small enough to fit in the string array that indicates the character of the filename being 
fixed-length filename fields. If a file has a name having more compared at any particular iteration of the comparison loop, 
characters than fit within the fixed-length filename field 400 71,15 embodiment is equivalent because an effective address, 
of a single file entry, such as entry 401, multiple file entries or / omter - 1S constructed by adding the base address to the 
are allocated to that file, including following entry 402. A 60 lndcx > foUowed b V a read-and-compare-character operation 
portion of the filename is stored in the fixed-length filename from that effective address, at each iteration of the loop, 
field 403 of the first file entry 401. Each file entry 401 A computer program product is any machine-readable 
containing toe filename contains a link or pointer whereby media, such as an EPROM, ROM, RAM, DRAM, disk 
any further file entries, such as file entry 402, containing memory, or tape, having recorded on it computer readable 
further portions of the filename may be found. Each further 65 code that, when read by and executed on a computer, 
file entry 402 may have a link to a further file entry or an instructs that computer to perform a particular function or 
indication that it is toe last file entry containing the file's sequence of functions. 
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What is claimed is: 

1. A method for searching a file directory, comprising: 
receiving a filename string to search for in the file 

directory, wherein the file directory includes a plurality 

of filename strings; 
identifying a filename string in the file directory for 

comparing to the searched for filename string; 
comparing a string length of the searched for filename 

string to a string length of the identified filename string; 
when the string lengths differ based on the comparing, 

reporting a filename mismatch and then repeating the 

identifying; 

when the string lengths are equal based on the comparing, 
setting a first pointer to reference a last character of the 
identified filename string; 

setting a second pointer to reference a last character of the 
searched for filename string; 

detecting a filename mismatch if the character of the 
identified filename string referenced by the first pointer 
does not match the character of the searched for file- 
name string referenced by the second pointer; 

checking if any more characters of the filename strings 
remain to be compared and reporting a filename match 
if no more characters remain to be compared; 

resetting the first pointer to reference an earlier character 
of the identified filename string; 

resetting the second pointer to reference an earlier char- 
acter of the searched for filename string; and 

repeating the steps of detecting, checking, and resetting 
until a filename mismatch is detected or a filename 
match is reported. 

2. The method of claim 1, wherein the string length of the 
identified filename string is stored in the file directory with 
the identified filename string and the comparing includes 
retrieving the stored string length. 

3. A computer program product, tangibly embodied on a 
computer readable medium comprising: 

computer readable code for receiving a filename string to 

search for in a file directory, wherein the file directory 

includes a plurality of filename strings; 
computer readable code for identifying a filename string 

in the file directory for comparing to the searched for 

filename string; 
computer readable code for comparing a string length of 

the searched for filename string to a string length of the 

identified filename string; 
computer readable code for when the string lengths differ 

based on the comparing, reporting a filename mismatch 

and then repeating the identifying; 
computer readable code for when the string lengths are 

equal based on the comparing, setting a first pointer to 

reference a last character of the identified filename 

string; 

computer readable code for setting a second pointer to 
reference a last character of the searched for filename 
string; 

computer readable code for detecting a filename mis- 
match if the character of the identified filename string 



10 



15 



25 



30 



35 



40 



45 



50 



55 



60 



referenced by the first pointer does not match the 
character of the searched for filename string referenced 
by the second pointer; 

computer readable code for checking if any more char- 
acters of the filename strings remain to be compared 
and reporting a filename match if no more characters 
remain to be compared; 

computer readable code for resetting the first pointer to 
reference an earlier character of the identified filename 
string; 

computer readable code for resetting the second pointer to 
reference an earlier character of the searched for file- 
name string; and 

computer readable code for repeating the steps of 
detecting, checking, and resetting until a filename mis- 
match is detected or a filename match is reported. 

4. The computer program product of claim 3, wherein the 
string length of the identified filename string is stored in the 
file directory with the identified filename string and the 
comparing includes retrieving the stored string length. 

5. A computer system tangibly embodied on a computer 
readable medium for searching a file directory, comprising: 

means for receiving a filename string to search for in the 

file directory, wherein the file directory includes a 

plurality of filename strings; 
means for identifying a filename string in the file directory 

for comparing to the searched for filename string; 
means for comparing a string length of the searched for 

filename string to a string length of the identified 

filename string; 
means for when the string lengths differ based on the 

comparing, reporting a filename mismatch and then 

repeating the identifying; 
means for when the string lengths are equal based on the 

comparing, setting a first pointer to reference a last 

character of the identified filename string; 
means for setting a second pointer to reference a last 

character of the searched for filename string; 
means for detecting a filename mismatch if the character 

of the identified filename string referenced by the first 

pointer does not match the character of the searched for 

filename string referenced by the second pointer; 
means for checking if any more characters of the filename 

strings remain to be compared and reporting a filename 

match if no more characters remain to be compared; 
means for resetting the first pointer to reference an earlier 

character of the identified filename string; 
means for resetting the second pointer to reference an 

earlier character of the searched for filename string; and 
means for repeating the steps of detecting, checking, and 

resetting until a filename mismatch is detected or a 

filename match is reported. 

6. The system of claim 5, wherein the string length of the 
identified filename string is stored in the file directory with 
the identified filename string and the comparing means 
includes means for retrieving the stored string length. 
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